{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "http://cs229.stanford.edu/ps/ps1/ps1.pdf"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from IPython.display import display, HTML"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "<style>\n",
       "   div#notebook-container    { width: 100%; }\n",
       "   div#menubar-container     { width: 100%; }\n",
       "   div#maintoolbar-container { width: 100%; }\n",
       "</style>\n"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "display(HTML(data=\"\"\"\n",
    "<style>\n",
    "   div#notebook-container    { width: 100%; }\n",
    "   div#menubar-container     { width: 100%; }\n",
    "   div#maintoolbar-container { width: 100%; }\n",
    "</style>\n",
    "\"\"\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# (a)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Given \n",
    "\n",
    "\\begin{align*} \n",
    "p(y) &= \\begin{cases}\n",
    "  \\phi          & \\text{if} \\; y = 1 \\\\\n",
    "  1 - \\phi     & \\text{if} \\; y = -1\n",
    "\\end{cases} \\\\\n",
    "p(x|y=1) &= \\frac{1}{(2\\pi)^{n/2} \\left|\\Sigma\\right|^{1/2}} \\exp\\Big(-\\frac{1}{2}(x - \\mu_{1})^T \\Sigma^{-1} (x - \\mu_{1}) \\Big) \\\\\n",
    "p(x|y=-1) &= \\frac{1}{(2\\pi)^{n/2} \\left|\\Sigma\\right|^{1/2}} \\exp\\Big(-\\frac{1}{2}(x - \\mu_{-1})^T \\Sigma^{-1} (x - \\mu_{-1}) \\Big) \\\\\n",
    "\\end{align*}\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For clarity, rewrite the above with "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\\begin{align*} \n",
    "p(x|y=1)  &= C e^A \\\\\n",
    "p(x|y=-1) &= C e^B \\\\\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "where"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\\begin{align*}\n",
    "A &= -\\frac{1}{2}(x - \\mu_{1})^T \\Sigma^{-1} (x - \\mu_{1}) \\\\\n",
    "B &= -\\frac{1}{2}(x - \\mu_{-1})^T \\Sigma^{-1} (x - \\mu_{-1}) \\\\\n",
    "C &= \\frac{1}{(2\\pi)^{n/2} \\left|\\Sigma\\right|^{1/2}}\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Since"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\\begin{align*} \n",
    "p(y|x) &= \\frac{p(x|y) p(y)}{p(x)} \\\\\n",
    "           &= \\frac{p(x|y) p(y)}{p(x|y=1) p(y=1) + p(x|y=-1) p(y=-1)}\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We evaluate the $y=1$ and $y=-1$ cases, separately"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When $y=1$,"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\\begin{align*} \n",
    "p(y|x) &= \\frac{p(x|y=1) p(y=1)}{p(x|y=1) p(y=1) + p(x|y=-1) p(y=-1)} \\\\\n",
    "           &= \\frac{Ce^A \\phi}{Ce^A \\phi + Ce^B (1 - \\phi)} \\\\\n",
    "           &= \\frac{e^A \\phi}{e^A \\phi + e^B (1 - \\phi)} \\\\\n",
    "           &= \\frac{1}{1 + \\frac{1 - \\phi}{\\phi} e^{B - A}} \\\\\n",
    "           &= \\frac{1}{1 + e^{(B - A + \\ln \\frac{1 - \\phi}{\\phi})}} \\\\\n",
    "           &= \\frac{1}{1 + e^{- 1 (A -B + \\ln \\frac{\\phi}{1 - \\phi})}} \\\\\n",
    "           &= \\frac{1}{1 + e^{- y (A -B + \\ln \\frac{\\phi}{1 - \\phi})}} \\\\\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When $y = -1$,"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\\begin{align*} \n",
    "p(y|x) &= \\frac{p(x|y=-1) p(y=-1)}{p(x|y=1) p(y=1) + p(x|y=-1) p(y=-1)} \\\\\n",
    "           &= \\frac{Ce^B (1 - \\phi)}{Ce^A \\phi + Ce^B (1 - \\phi)} \\\\\n",
    "           &= \\frac{1}{\\frac{\\phi}{1 - \\phi} e^{A - B} + 1} \\\\\n",
    "           &= \\frac{1}{1 + e^{(A - B + \\ln \\frac{\\phi}{1 - \\phi})}} \\\\\n",
    "           &= \\frac{1}{1 + e^{- y (A - B + \\ln \\frac{\\phi}{1 - \\phi})}} \\\\\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Therefore, in both cases,\n",
    "\n",
    "$$p(y|x) = \\frac{1}{1 + e^{- y (A - B + \\ln \\frac{\\phi}{1 - \\phi})}}$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To show $p(y|x)$ is in the form of a logistic function:\n",
    "\n",
    "$$p(y|x) = \\frac{1}{1 + e^{-y(\\theta^T x + \\theta_0)}}$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "we expand the exponent"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\\begin{align*}\n",
    "A - B + \\ln \\frac{\\phi}{1 - \\phi} \n",
    "&= -\\frac{1}{2}(x - \\mu_{1})^T \\Sigma^{-1} (x - \\mu_{1}) + \\frac{1}{2}(x - \\mu_{-1})^T \\Sigma^{-1} (x - \\mu_{-1}) + \\ln \\frac{\\phi}{1 - \\phi}  \\\\\n",
    "&= \\frac{1}{2}(x - \\mu_{-1})^T \\Sigma^{-1} (x - \\mu_{-1}) -\\frac{1}{2}(x - \\mu_{1})^T \\Sigma^{-1} (x - \\mu_{1})  + \\ln \\frac{\\phi}{1 - \\phi}  \\\\\n",
    "&= \\frac{1}{2}\\big(x^T \\Sigma^{-1}x - x^T \\Sigma^{-1}\\mu_{-1} - \\mu_{-1}^T\\Sigma^{-1}x + \\mu_{-1}^T\\Sigma^{-1}\\mu_{-1} - x^T\\Sigma^{-1}x + x^T\\Sigma^{-1}\\mu_1 + \\mu_1^T\\Sigma^{-1}x - \\mu_1^T\\Sigma^{-1}\\mu_1 \\big) + \\ln \\frac{\\phi}{1 - \\phi} \\\\\n",
    "&= \\frac{1}{2}\\big(x^T \\Sigma^{-1} (\\mu_1 - \\mu_{-1}) + (\\mu_1 - \\mu_{-1})^T \\Sigma^{-1} x + \\mu_{-1}^T\\Sigma^{-1}\\mu_{-1} - \\mu_1^T\\Sigma^{-1}\\mu_1  \\big) + \\ln \\frac{\\phi}{1 - \\phi} \\\\\n",
    "&= (\\mu_1 - \\mu_{-1})^T \\Sigma^{-1} x + \\frac{1}{2} \\big( \\mu_{-1}^T\\Sigma^{-1}\\mu_{-1} - \\mu_1^T\\Sigma^{-1}\\mu_1  \\big) + \\ln \\frac{\\phi}{1 - \\phi} \\\\\n",
    "&= (\\mu_1 - \\mu_{-1})^T \\Sigma^{-1} x + D\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "where D is a constant. The 5th equality used the fact that $\\Sigma$ is a symmetric covariance matrix, thus  \n",
    "\n",
    "$$\n",
    "x^T \\Sigma^{-1} (\\mu_1 - \\mu_{-1}) = (\\mu_1 - \\mu_{-1})^T \\Sigma^{-1} x\n",
    "$$\n",
    "\n",
    "and \n",
    "\n",
    "$$\n",
    "x^T \\Sigma^{-1} (\\mu_1 - \\mu_{-1}) + (\\mu_1 - \\mu_{-1})^T \\Sigma^{-1} x = 2 (\\mu_1 - \\mu_{-1})^T \\Sigma^{-1} x\n",
    "$$\n",
    "\n",
    "Hence"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\\begin{align*}\n",
    "\\theta^T &= (\\mu_1 - \\mu_{-1})^T \\Sigma^{-1} \\\\\n",
    "\\theta_0 &= D \\\\\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# (b)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### First, derive the log-likelood function"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the case when $x$ is one-dimensional,"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\\begin{align*} \n",
    "p(y) &= \\begin{cases}\n",
    "  \\phi          & \\text{if} \\; y = 1 \\\\\n",
    "  1 - \\phi     & \\text{if} \\; y = -1\n",
    "\\end{cases} \\\\\n",
    "p(x|y=1) &= \\frac{1}{(2\\pi \\sigma^2)^{\\frac{1}{2}}} \\exp\\Big(-\\frac{(x - \\mu_1)^2}{2\\sigma^{2}}\\Big ) \\\\\n",
    "p(x|y=-1) &= \\frac{1}{(2\\pi \\sigma^2)^{\\frac{1}{2}}} \\exp\\Big(-\\frac{(x - \\mu_{-1})^2}{2\\sigma^{2}}\\Big ) \\\\\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The log-likelihood is"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\\begin{align*} \n",
    "l(\\phi, \\mu_{1}, \\mu_{-1}, \\sigma^2)\n",
    "    &= \\log \\prod_{i=1}^{m} p(x^{(i)}, y^{(i)}; \\phi, \\mu_{-1}, \\mu_1, \\sigma^2) \\\\\n",
    "    &= \\log \\prod_{i=1}^{m} p(x^{(i)}|y^{(i)}; \\phi, \\mu_{-1}, \\mu_1, \\sigma^2) p(y^{(i)}; \\phi) \\\\\n",
    "    &= \\log \\prod_{i=1}^{m} \\Big[\\mathbb{1}\\{y^{(i)} = 1\\}  \\frac{1}{(2\\pi \\sigma^2)^{\\frac{1}{2}}} \\exp\\Big(-\\frac{(x^{(i)} - \\mu_1)^2}{2\\sigma^{2}}\\Big ) \\phi + \\mathbb{1}\\{y^{(i)} = -1\\} \\frac{1}{(2\\pi \\sigma^2)^{\\frac{1}{2}}} \\exp\\Big(-\\frac{(x^{(i)} - \\mu_{-1})^2}{2\\sigma^{2}}\\Big ) (1 - \\phi) \\Big] \\\\\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Consider there are $m_1$ data point with $y=1$ and $m_{-1}$ with $y=-1$, i.e. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\\begin{align*}\n",
    "m_1 &= \\sum_{i=1}^{m} \\mathbb{1}\\{y^{(i)} = 1\\}  \\\\\n",
    "m_{-1} &= \\sum_{i=1}^{m} \\mathbb{1}\\{y^{(i)} = -1\\}  \\\\\n",
    "m &= m_1 + m_{-1}\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then,"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\\begin{align*} \n",
    "l(\\phi, \\mu_{1}, \\mu_{-1}, \\sigma^2)\n",
    "    &= \\log \\Bigg[ \\bigg( \\prod_{i=1}^{m_1}  \\frac{1}{(2\\pi \\sigma^2)^{\\frac{1}{2}}} \\exp\\Big(-\\frac{(x^{(i)} - \\mu_1)^2}{2\\sigma^{2}}\\Big ) \\phi \\bigg) \\bigg( \\prod_{j=1}^{m_{-1}} \\frac{1}{(2\\pi \\sigma^2)^{\\frac{1}{2}}} \\exp\\Big(-\\frac{(x^{(j)} - \\mu_{-1})^2}{2\\sigma^{2}}\\Big ) (1 - \\phi) \\bigg) \\Bigg ] \\\\\n",
    "    &= \\sum_{i=1}^{m}\\log \\Big( \\frac{1}{(2\\pi \\sigma^2)^{\\frac{1}{2}}} \\Big) + \\sum_{i=1}^{m_1} \\Big(-\\frac{(x^{(i)} - \\mu_1)^2}{2\\sigma^{2}}\\Big ) + \\sum_{i=1}^{m_1} \\log \\phi + \\sum_{i=1}^{m_{-1}} \\Big(-\\frac{(x^{(i)} - \\mu_{-1})^2}{2\\sigma^{2}}\\Big ) + \\sum_{i=1}^{m_{-1}}\\log(1 - \\phi) \\\\\n",
    "   &= \\sum_{i=1}^{m} -\\frac{1}{2} \\log(2\\pi \\sigma^2) - \\sum_{i=1}^{m_1} \\frac{(x^{(i)} - \\mu_1)^2}{2\\sigma^{2}} - \\sum_{i=1}^{m_{-1}} \\frac{(x^{(i)} - \\mu_{-1})^2}{2\\sigma^{2}} + \\sum_{i=1}^{m_1} \\log \\phi  + \\sum_{i=1}^{m_{-1}}\\log(1 - \\phi) \\\\\n",
    "   &= -\\frac{m}{2} \\log(2\\pi \\sigma^2) - \\sum_{i=1}^{m_1} \\frac{(x^{(i)} - \\mu_1)^2}{2\\sigma^{2}} - \\sum_{i=1}^{m_{-1}} \\frac{(x^{(i)} - \\mu_{-1})^2}{2\\sigma^{2}} + m_1 \\log \\phi  + m_{-1}\\log(1 - \\phi) \\\\\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Second, maximize the log-likelihood"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Calculate $\\frac{\\partial l}{\\partial \\phi}$,"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\\begin{align*} \n",
    "\\frac{\\partial l}{\\partial \\phi}\n",
    "    &= \\frac{m_1}{\\phi} - \\frac{m_{-1}}{1 - \\phi} \\\\\n",
    "\\end{align*} "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Set it to 0,"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\\begin{align*} \n",
    "\\frac{m_1}{\\phi} - \\frac{m_{-1}}{1 - \\phi} & = 0 \\\\\n",
    "m_1 (1 - \\phi) - m_{-1} \\phi & = 0 \\\\\n",
    "m_1 - (m_1 + m_{-1}) \\phi &= 0 \\\\\n",
    "m_1 - (m) \\phi &= 0 \\\\\n",
    "\\end{align*} "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So,"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$\\phi = \\frac{m_1}{m} = \\frac{1}{m}\\sum_{i=1}^{m} \\mathbb{1}\\{y^{(i)} = 1\\}$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Calculate $\\frac{\\partial l}{\\partial \\mu_1}$,"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\\begin{align*} \n",
    "\\frac{\\partial l}{\\partial \\mu_1}\n",
    "    &=  -\\frac{1}{2 \\sigma ^2}  \\sum_{i=1}^{m_1} 2(x^{(i)} - \\mu_1) \\cdot (-1) \\\\\n",
    "    &=  \\frac{1}{\\sigma ^2}  \\sum_{i=1}^{m_1} (x^{(i)} - \\mu_1) \\\\\n",
    "    &=  \\frac{1}{\\sigma ^2}  \\sum_{i=1}^{m_1} x^{(i)} - \\sum_{i=1}^{m_1} \\mu_1 \\\\\n",
    "    &=  \\frac{1}{\\sigma ^2} \\bigg( \\sum_{i=1}^{m} \\mathbb{1}\\{y^{(i)} = 1\\}  x^{(i)} - \\sum_{i=1}^{m} \\mathbb{1}\\{y^{(i)} = 1\\}  \\mu_1 \\bigg) \\\\\n",
    "\\end{align*} "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Set it to 0, we obtain"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$ \\mu_1 = \\frac{\\sum_{i=1}^{m} \\mathbb{1}\\{y^{(i)} = 1\\}  x^{(i)}}{\\sum_{i=1}^{m} \\mathbb{1}\\{y^{(i)} = 1\\}}$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "By the same token (or via symmetry since the label of $1$ and $-1$ could be interchangeable),"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$ \\mu_{-1} = \\frac{\\sum_{i=1}^{m} \\mathbb{1}\\{y^{(i)} = -1\\}  x^{(i)}}{\\sum_{i=1}^{m} \\mathbb{1}\\{y^{(i)} = -1\\}}$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Calculate $\\frac{\\partial l}{\\partial \\sigma^2} = 0$ (Note we are doing derivative w.r.t. $\\sigma^2$ instead of $\\sigma$),"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\\begin{align*} \n",
    "\\frac{\\partial l}{\\partial \\sigma^2}\n",
    "    &= -\\frac{m}{2} \\frac{1}{2\\pi \\sigma^2} 2\\pi - \\sum_{i=1}^{m_1} \\frac{(x^{(i)} - \\mu_1)^2}{2} \\big(- (\\sigma^{2})^{-2} \\big) - \\sum_{i=1}^{m_{-1}} \\frac{(x^{(i)} - \\mu_{-1})^2}{2} \\big(- (\\sigma^{2})^{-2} \\big) \\\\\n",
    "    &= -\\frac{m}{2 \\sigma^2} + \\sum_{i=1}^{m_1} \\frac{(x^{(i)} - \\mu_1)^2}{2\\sigma^{4}} + \\sum_{i=1}^{m_{-1}} \\frac{(x^{(i)} - \\mu_{-1})^2}{2\\sigma^4} \\\\\n",
    "\\end{align*} "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Set it to 0,"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\\begin{align*} \n",
    "\\frac{m}{2 \\sigma^2} &= \\sum_{i=1}^{m_1} \\frac{(x^{(i)} - \\mu_1)^2}{2 \\sigma^{4}} + \\sum_{i=1}^{m_{-1}} \\frac{(x^{(i)} - \\mu_{-1})^2}{2 \\sigma^4} \\\\\n",
    "m \\sigma^2 &= \\sum_{i=1}^{m_1} (x^{(i)} - \\mu_1)^2 + \\sum_{i=1}^{m_{-1}} (x^{(i)} - \\mu_{-1})^2 \\\\\n",
    "                                   &= \\sum_{i=1}^{m} (x^{(i)} - \\mu_{y{(i)}})^2 \\\\\n",
    "\\sigma^2 &= \\frac{1}{m} \\sum_{i=1}^{m} (x^{(i)} - \\mu_{y{(i)}})^2\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# (c)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For the more general multi-dimensional cases,"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\\begin{align*} \n",
    "p(y) &= \\begin{cases}\n",
    "  \\phi          & \\text{if} \\; y = 1 \\\\\n",
    "  1 - \\phi     & \\text{if} \\; y = -1\n",
    "\\end{cases} \\\\\n",
    "p(x|y=1) &= \\frac{1}{(2\\pi)^{n/2} \\left|\\Sigma\\right|^{1/2}} \\exp\\Big(-\\frac{1}{2}(x - \\mu_{1})^T \\Sigma^{-1} (x - \\mu_{1}) \\Big) \\\\\n",
    "p(x|y=-1) &= \\frac{1}{(2\\pi)^{n/2} \\left|\\Sigma\\right|^{1/2}} \\exp\\Big(-\\frac{1}{2}(x - \\mu_{-1})^T \\Sigma^{-1} (x - \\mu_{-1}) \\Big) \\\\\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Similar to what has been done in the one-dimensional case, the log-likelihood is"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\\begin{align*} \n",
    "l(\\phi, \\mu_{1}, \\mu_{-1}, \\sigma^2)\n",
    "&= \\log \\Bigg[ \\bigg( \\prod_{i=1}^{m_1}  \\frac{1}{(2\\pi)^{n/2} \\left|\\Sigma\\right|^{1/2}}  \\exp\\Big(-\\frac{1}{2}(x^{(i)} - \\mu_{1})^T \\Sigma^{-1} (x^{(i)} - \\mu_{1}) \\Big) \\phi \\bigg) \\bigg( \\prod_{j=1}^{m_{-1}} \\frac{1}{(2\\pi)^{n/2} \\left|\\Sigma\\right|^{1/2}}  \\exp\\Big(-\\frac{1}{2}(x^{(j)} - \\mu_{-1})^T \\Sigma^{-1} (x^{(j)} - \\mu_{-1}) \\Big) (1 - \\phi) \\bigg) \\Bigg] \\\\\n",
    "&= -\\sum_{i=1}^m \\log \\Big( (2\\pi)^{n/2} \\left|\\Sigma\\right|^{1/2} \\Big) - \\sum_{i=1}^{m_1} \\Big(\\frac{1}{2}(x^{(i)} - \\mu_{1})^T \\Sigma^{-1} (x^{(i)} - \\mu_{1}) \\Big) - \\sum_{i=1}^{m_{-1}} \\Big(\\frac{1}{2}(x^{(i)} - \\mu_{-1})^T \\Sigma^{-1} (x^{(i)} - \\mu_{-1}) \\Big) + m_1 \\log \\phi  + m_{-1}\\log(1 - \\phi)  \\\\\n",
    "&= -m \\Big(\\frac{n}{2}\\log(2\\pi) + \\log(\\left|\\Sigma\\right|^{1/2}) \\Big) - \\sum_{i=1}^{m_1} \\Big(\\frac{1}{2}(x^{(i)} - \\mu_{1})^T \\Sigma^{-1} (x^{(i)} - \\mu_{1}) \\Big) - \\sum_{i=1}^{m_{-1}} \\Big(\\frac{1}{2}(x^{(i)} - \\mu_{-1})^T \\Sigma^{-1} (x^{(i)} - \\mu_{-1}) \\Big) + m_1 \\log \\phi  + m_{-1}\\log(1 - \\phi) \\\\\n",
    "&= - \\frac{mn}{2}\\log{(2\\pi)} - \\frac{m}{2}\\log(\\left|\\Sigma\\right|) - \\sum_{i=1}^{m_1} \\Big(\\frac{1}{2}(x^{(i)} - \\mu_{1})^T \\Sigma^{-1} (x^{(i)} - \\mu_{1}) \\Big) - \\sum_{i=1}^{m_{-1}} \\Big(\\frac{1}{2}(x^{(i)} - \\mu_{-1})^T \\Sigma^{-1} (x^{(i)} - \\mu_{-1}) \\Big) + m_1 \\log \\phi  + m_{-1}\\log(1 - \\phi)\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The second equal sign is analogous to the one dimensional case, the 3rd and 4th equal signs expands the first part"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The derivation of $\\phi$ is exactly the same as in the one-dimensional case in **(b)**."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Use the following result proved in Problem-set 0, $\\nabla_x x^TAx = 2Ax$,"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Calculate $\\nabla_{\\mu_1} l$,"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\\begin{align*} \n",
    "\\nabla_{\\mu_1} l &= - \\sum_{i=1}^{m_1}\\frac{1}{2} \\cdot 2 \\Sigma^{-1} (x^{(i)} - \\mu_1)(-1) \\\\\n",
    "                           &= \\Sigma^{-1}\\sum_{i=1}^{m_1}(x^{(i)} - \\mu_1) \\\\\n",
    "                           &= \\Sigma^{-1} \\bigg( \\sum_{i=1}^{m} \\mathbb{1}\\{y^{(i)} = 1\\}  x^{(i)} - \\sum_{i=1}^{m} \\mathbb{1}\\{y^{(i)} = 1\\}  \\mu_1 \\bigg)\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Comparing to the one dimensional case, they are very analogous (they should be) except that $\\frac{1}{\\sigma^2}$ becomes $\\Sigma^{-1}$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Set it to 0,"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$ \\mu_1 = \\frac{\\sum_{i=1}^{m} \\mathbb{1}\\{y^{(i)} = 1\\}  x^{(i)}}{\\sum_{i=1}^{m} \\mathbb{1}\\{y^{(i)} = 1\\}}$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "By symmetry, we obtain"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$ \\mu_{-1} = \\frac{\\sum_{i=1}^{m} \\mathbb{1}\\{y^{(i)} = -1\\}  x^{(i)}}{\\sum_{i=1}^{m} \\mathbb{1}\\{y^{(i)} = -1\\}}$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "According to http://cs229.stanford.edu/section/cs229-linalg.pdf:\n",
    "\n",
    "\\begin{align*} \n",
    "\\nabla_{A} \\log|A| &= \\frac{1}{|A|} \\nabla_{A}|A| = A^{-1}\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "and according to Eq. 13.42 in https://people.eecs.berkeley.edu/~jordan/courses/260-spring10/other-readings/chapter13.pdf:\n",
    "\n",
    "\\begin{align*} \n",
    "\\nabla_{A} x^T A x &= xx^T \\\\\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With these two pieces of results, let's do $\\nabla_{\\Sigma^{-1}}l$ instead of $\\nabla_{\\Sigma} l$, by part:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Part 1: "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\\begin{align*} \n",
    "\\nabla_{\\Sigma^{-1}} \\Big( - \\frac{m}{2}\\log(\\left|\\Sigma\\right|) \\Big)\n",
    "    &= \\nabla_{\\Sigma^{-1}} \\Big(- \\frac{m}{2}\\log \\Big(\\frac{1}{\\left|\\Sigma^{-1} \\right|} \\big) \\Big) \\\\\n",
    "    &= \\nabla_{\\Sigma^{-1}} (\\frac{m}{2}\\log(\\left|\\Sigma^{-1}\\right|))  \\\\\n",
    "    &= \\frac{m}{2}(\\Sigma^{-1})^{-1} \\\\\n",
    "    &= \\frac{m}{2}\\Sigma\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here, the first equality used the property that the determinant of inverse is equal to the inverse of determinant. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Part 2:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\\begin{align*} \n",
    "  &\\quad \\nabla_{\\Sigma^{-1}} \\bigg( - \\sum_{i=1}^{m_1} \\Big(\\frac{1}{2}(x^{(i)} - \\mu_{1})^T \\Sigma^{-1} (x^{(i)} - \\mu_{1}) \\Big) - \\sum_{i=1}^{m_{-1}} \\Big(\\frac{1}{2}(x^{(i)} - \\mu_{-1})^T \\Sigma^{-1} (x^{(i)} - \\mu_{-1}) \\Big) \\bigg )  \\\\\n",
    "    &= - \\sum_{i=1}^{m_1} \\Big(\\frac{1}{2}(x^{(i)} - \\mu_{1}) (x^{(i)} - \\mu_{1})^T \\Big) - \\sum_{i=1}^{m_{-1}} \\Big(\\frac{1}{2}(x^{(i)} - \\mu_{-1}) (x^{(i)} - \\mu_{-1})^T \\Big)   \\\\\n",
    "    &= - \\frac{1}{2} \\sum_{i=1}^{m} (x^{(i)} - \\mu_{y^{(i)}}) (x^{(i)} - \\mu_{y^{(i)}})^T   \\\\\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Aggregate the two parts and set it to 0,"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\\begin{align*}\n",
    "\\frac{m}{2}\\Sigma - \\frac{1}{2} \\sum_{i=1}^{m} (x^{(i)} - \\mu_{y^{(i)}}) (x^{(i)} - \\mu_{y^{(i)}})^T &= 0 \\\\\n",
    "\\Sigma = \\frac{1}{m} \\sum_{i=1}^{m} (x^{(i)} - \\mu_{y^{(i)}}) (x^{(i)} - \\mu_{y^{(i)}})^T\n",
    "\\end{align*}"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
